Lab 4 - Data Visualization

Author

Christian Navarro

#install.packages(c("data.table","leaflet"))
library(data.table)
library(leaflet)
library(tidyverse)

Learning Goals

  • Read in and prepare the meteorological dataset
  • Create several graphs with different geoms() in ggplot2
  • Create a facet graph
  • Customize your plots
  • Create a more detailed map using leaflet()

Lab Description

We will again work with the meteorological data presented in lecture.

The objective of the lab is to examine the association between weekly average dew point and wind speed in four regions of the US and by elevation.

Per Wikipedia: “The dew point of a given body of air is the temperature to which it must be cooled to become saturated with water vapor. This temperature depends on the pressure and water content of the air.”

Again, feel free to supplement your knowledge of this dataset by checking out the data dictionary.

Steps

1. Read in the data

First download and then read in with data.table::fread()

if (!file.exists("met_all.gz"))
  download.file(
    url = "https://raw.githubusercontent.com/USCbiostats/data-science-data/master/02_met/met_all.gz",
    destfile = "met_all.gz",
    method   = "libcurl",
    timeout  = 60
    )
met <- data.table::fread("met_all.gz")

2. Prepare the data

  • Remove temperatures less than -17C
  • Make sure there is no missing data in the key variables coded as 9999, 999, etc.
  • Generate a date variable using the functions as.Date() (hint: You will need the following to create a date paste(year, month, day, sep = "-")).
  • Using the data.table::week function, keep the observations of the first week of the month.
  • Compute the mean by station of the variables temp, rh, wind.sp, vis.dist, dew.point, lat, lon, and elev.
  • Create a region variable for NW, SW, NE, SE based on lon = -98.00 and lat = 39.71 degrees
  • Create a categorical variable for elevation as in the lecture slides
met <- met[temp > -17][elev == 9999.0, elev := NA]
met[, week := week(as.Date(paste(year, month, day, sep = "-")))]
met <- met[week == min(week, na.rm = TRUE)]

met_avg <- met[,.(temp=mean(temp,na.rm=TRUE), rh=mean(rh,na.rm=TRUE), wind.sp=mean(wind.sp,na.rm=TRUE), 
                vis.dist=mean(vis.dist,na.rm=TRUE), dew.point = mean(dew.point, na.rm=TRUE), lat=mean(lat), lon=mean(lon), 
                elev=mean(elev,na.rm=TRUE)), by="USAFID"]

met_avg$elev_cat <- ifelse(met_avg$elev> 252, "high", "low")

met_avg$region <- ifelse(met_avg$lon > -98 & met_avg$lat >39.71, "north east",
                         ifelse(met_avg$lon > -98 & met_avg$lat < 39.71, "south east",
                                ifelse(met_avg$lon < -98 & met_avg$lat >39.71, "north west", "south west")))

table(met_avg$region)

north east north west south east south west 
       484        146        649        296 

3. Use geom_violin to examine the wind speed and dew point by region

You saw how to use geom_boxplot in class. Try using geom_violin instead (take a look at the help). (hint: you will need to set the x aesthetic to 1)

  • Use facets
  • Make sure to deal with NAs
  • Describe what you observe in the graph
met_avg %>%
  filter(!(wind.sp %in% NA)) %>%
ggplot()+
  geom_violin(mapping = aes(y=wind.sp, x=1)) +
  facet_wrap(~region, nrow=2)
Warning: Removed 13 rows containing non-finite outside the scale range
(`stat_ydensity()`).

met_avg %>%
  filter(!(dew.point %in% NA)) %>%
ggplot()+
  geom_boxplot(mapping = aes(y=rh, fill=region)) +
  facet_wrap(~region, nrow=2)

The violin plot showed regional patterns in wind speed distributions.

4. Use geom_jitter with stat_smooth to examine the association between dew point and wind speed by region

  • Color points by region
  • Make sure to deal with NAs
  • Fit a linear regression line by region
  • Describe what you observe in the graph
met_avg %>%
filter(!(region %in% NA)) %>%
  ggplot(mapping = aes(x=dew.point, y= wind.sp, color=region))+
  geom_jitter() + 
  stat_smooth(method=lm)
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 13 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 13 rows containing missing values or values outside the scale range
(`geom_point()`).

The Southeast region has the highest dew point values, indicating it is the most humid area with the greatest atmospheric moisture. The Northwest shows the lowest dew point values, making it the driest region with the least humidity.

5. Use geom_bar to create barplots of the weather stations by elevation category colored by region

  • Bars by elevation category using position="dodge"
  • Change colors from the default. Color by region using scale_fill_brewer see this
  • Create nice labels on the axes and add a title
  • Describe what you observe in the graph
  • Make sure to deal with NA values
met_avg %>%
filter(!(region %in% NA)) %>%
  ggplot()+
  geom_bar(mapping=aes(x=elev_cat,fill=region), position = "dodge")+
  scale_fill_brewer(palette = "PuOr")+
  labs(title="Number of weather stations by elevation category and region", x="Elevation Category", y= "Count")+
  theme_bw()

The Northwest has a strong presence in both high and low elevation stations, but the Northeast dominates in the high-elevation count. The Southeast and Southwest have predominantly low-elevation stations, with the Southeast having the most stations overall but concentrated at lower elevations.

6. Use stat_summary to examine mean dew point and wind speed by region with standard deviation error bars

  • Make sure to remove NAs
  • Use fun.data="mean_sdl" in stat_summary
  • Add another layer of stats_summary but change the geom to "errorbar" (see the help).
  • Describe the graph and what you observe
met_avg %>%
filter(!(region %in% NA)) %>%
  ggplot(mapping=aes(x=region, y=dew.point)) +
  stat_summary(fun.data="mean_sdl", geom="errorbar") +
  stat_summary(fun.data="mean_sdl")

  • Dew point is a measure of atmospheric moisture. It indicates the temperature at which air becomes saturated and dew forms. A higher dew point means more moisture in the air, leading to muggy conditions, while a lower dew point indicates drier air.

  • Wind speed is the rate at which air moves horizontally across the Earth’s surface, usually measured in meters per second or miles per hour. It affects weather patterns, evaporation, and perceived temperature. In this dataset, wind speed may be analyzed to see how it correlates with dew point, temperature, or geographic region.

7. Make a map showing the spatial trend in relative humidity in the US

  • Make sure to remove NAs
  • Use leaflet()
  • Make a color palette with custom colors
  • Use addMarkers to include the top 10 places in relative humidity (hint: this will be useful rank(-rh) <= 10)
  • Add a legend
met_avg2<-met_avg[!is.na(rh)]

# Top five
top5 <- met_avg2[rank(-rh) <= 10]

rh_pal = colorNumeric(c('blue','purple','red'), domain=met_avg2$rh)
leaflet(met_avg2) %>%
  addProviderTiles('OpenStreetMap') %>%
  addCircles(lat=~lat, lng=~lon, color=~rh_pal(rh), label=~paste0(round(rh,2), ' rh'), opacity=1,fillOpacity=1, radius=500) %>%
  addMarkers(lat=~lat, lng=~lon, label=~paste0(round(rh,2), ' rh'), data = top5) %>%
  addLegend('bottomleft',pal=rh_pal, values=met_avg2$rh, title="Relative Humidity", opacity=1)
  • Describe the trend in RH across the US Relative humidity is highest in the southeastern US and coastal regions due to moisture from the Gulf of Mexico and oceans, and lowest in the western interior and desert areas due to arid conditions and higher elevation.

8. Use a ggplot extension

  • Pick an extension (except cowplot) from here and make a plot of your choice using the met data (or met_avg)
  • You might want to try examples that come with the extension first (e.g. ggtech, gganimate, ggforce)
library(ggplot2)
ggplot(met_avg, aes(x = temp, y = wind.sp)) +
  geom_point(aes(color = region),) +
  labs(title = "Temperature vs Wind Speed by Region",
       x = "Temperature (°C)", 
       y = "Wind Speed (m/s)",
       color = "Region") +
  theme_minimal()
Warning: Removed 13 rows containing missing values or values outside the scale range
(`geom_point()`).